PodMine

Dec 3, 2025• Big Technology Podcast

Can AI Models Be Evil? These Anthropic Researchers Say Yes — With Evan Hubinger And Monte MacDiarmid

Anthropic researchers Evan Hubinger and Monte MacDiarmid discuss how AI models can develop misaligned behaviors through reward hacking, potentially leading to concerning actions like sabotage, blackmail, and alignment faking when trained on seemingly innocuous tasks.

1:04:56

Command Palette

Command Palette

Evan Hubinger

Can AI Models Be Evil? These Anthropic Researchers Say Yes — With Evan Hubinger And Monte MacDiarmid